Bootstrapping Noun Groups Using Closed-Class Elements Only
نویسندگان
چکیده
The identification of noun groups in text is a well researched task and serves as a pre-step for other natural language processing tasks, such as the extraction of keyphrases or technical terms. We present a first version of a noun group chunker that, given an unannotated text corpus, adapts itself to the domain at hand in an unsupervised way. Our approach is inspired by findings from cognitive linguistics, in particular the division of language into open-class elements and closedclass elements. Our system extracts noun groups using lists of closed-class elements and one linguistically inspired seed extraction rule for each open class. Supplied with raw text, the system creates an initial validation set for each open class based on the seed rules and applies a bootstrapping procedure to mutually expand the set of extraction rules and the validation sets. Possibly domain-dependent information about open-class elements, as for example provided by a part-of speech lexicon, is not used by the system in order to ensure the domain-independency of the approach. Instead, the system adapts itself automatically to the domain of the input text by bootstrapping domain-specific validation lists. An evaluation of our system on the Wall Street Journal training corpus used for the CONLL 2000 shared task on chunking shows that our bootstrapping approach can be successfully applied to the task of noun group chunking.
منابع مشابه
Interpreting Noun Compounds using Bootstrapping and Sense Collocation
This paper describes a bootstrapping method for automatically tagging noun compounds with their corresponding semantic relations. Our work takes advantage of the collocation of senses of the noun compound constituents and also word similarity. We exploit this to generate a set of noun compounds from a set of previously tagged noun compounds by replacing one constituent of each noun compound wit...
متن کاملProsodic Bootstrapping 1 Running Head: PROSODIC BOOTSTRAPPING OF PHRASES The Prosodic Bootstrapping of Phrases: Evidence from Prelinguistic Infants
The current study explores infants’ use of prosodic cues coincident with phrases in processing fluent speech. After familiarization with two versions of the same word sequence, both 6and 9-month-olds showed a preference for a passage containing the sequence as a noun phrase over a passage with the same sequence as a syntactic non-unit. However, this result was found only in one of two groups, t...
متن کاملBootstrapping Method for Chunk Alignment in Phrase Based SMT
The processing of parallel corpus plays very crucial role for improving the overall performance in Phrase Based Statistical Machine Translation systems (PBSMT). In this paper the automatic alignments of different kind of chunks have been studied that boosts up the word alignment as well as the machine translation quality. Single-tokenization of Noun-noun MWEs, phrasal preposition (source side o...
متن کاملLarge-Scale Noun Compound Interpretation Using Bootstrapping and the Web as a Corpus
Responding to the need for semantic lexical resources in natural language processing applications, we examine methods to acquire noun compounds (NCs), e.g., orange juice, together with suitable fine-grained semantic interpretations, e.g., squeezed from, which are directly usable as paraphrases. We employ bootstrapping and web statistics, and utilize the relationship between NCs and paraphrasing...
متن کاملBootstrapping for Named Entity Tagging Using Concept-based Seeds
A novel bootstrapping approach to Named Entity (NE)tagging using concept-based seeds and successive learners is presented. This approach only requires a few common noun or pronoun seeds that correspond to the concept for the targeted NE, e.g. he/she/man/woman for PERSON NE. The bootstrapping procedure is implemented as training two successive learners. First, decision list is used to learn the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010